Genetic Dissection of Extreme Seed-Flooding Tolerance in a Wild Soybean PI342618B by Linkage Mapping and Candidate Gene Analysis

Seed-flooding stress is one of major abiotic constraints that adversely affects soybean production worldwide. Identifying tolerant germplasms and revealing the genetic basis of seed-flooding tolerance are imperative goals for soybean breeding. In the present study, high-density linkage maps of two inter-specific recombinant inbred line (RIL) populations, named NJIRNP and NJIR4P, were utilized to identify major quantitative trait loci (QTLs) for seed-flooding tolerance using three parameters viz., germination rate (GR), normal seedling rate (NSR), and electrical conductivity (EC). A total of 25 and 18 QTLs were detected by composite interval mapping (CIM) and mixed-model-based composite interval mapping (MCIM), respectively, and 12 common QTLs were identified through both methods. All favorable alleles for the tolerance are notably from the wild soybean parent. Moreover, four digenic epistatic QTL pairs were identified, and three of them showed no main effects. In addition, the pigmented soybean genotypes exhibited high seed-flooding tolerance compared with yellow seed coat genotypes in both populations. Moreover, out of five identified QTLs, one major region containing multiple QTLs associated with all three traits was identified on Chromosome 8, and most of the QTLs within this hotspot were major loci (R2 > 10) and detectable in both populations and multiple environments. Based on the gene expression and functional annotation information, 10 candidate genes from QTL “hotspot 8-2” were screened for further analysis. Furthermore, the results of qRT-PCR and sequence analysis revealed that only one gene, GmDREB2 (Glyma.08G137600), was significantly induced under flooding stress and displayed a TTC tribasic insertion mutation of the nucleotide sequence in the tolerant wild parent (PI342618B). GmDREB2 encodes an ERF transcription factor, and the subcellular localization analysis using green fluorescent protein (GFP) revealed that GmDREB2 protein was localized in the nucleus and plasma membrane. Furthermore, overexpression of GmDREB2 significantly promoted the growth of soybean hairy roots, which might indicate its critical role in seed-flooding stress. Thus, GmDREB2 was considered as the most possible candidate gene for seed-flooding tolerance.


Introduction
Natural disasters are becoming more often as the world's climate changes, resulting in a significant drop in crop production. Among these various abiotic stresses, flooding stress has emerged as the greatest threat to crop growth and development, affecting crop production and yield worldwide [1]. Due to excessive rainfall and poor drainage, the soil becomes flooded, causing hypoxia in below-ground parts and inhibiting plant growth [2]. Soybeans (Glycine max [L.] Merr.) are one of the most important commercial crops in the world due to its high nutritional values, containing protein, oil, carbohydrates, and isoflavones [3].
However, soybeans are relatively susceptible to flooding stress, with considerable growth and yield inhibition under flood conditions [4,5]. Flooding stress affects soybeans at different growth and development stages, including germination, vegetative, and reproductive stages, which results in a great loss in soybean yields [3]. Moreover, flooding imposes a severe selection pressure on plants since excess water in the living surroundings can deprive them of oxygen, carbon dioxide, and light [6]. Flooding stress is considered as a primary limiting factor in soybean cultivation because 60-70% of annual precipitation occurs during the growth and development period [7]. Additionally, several studies have displayed that flooding stress has a consistent impact on soybean growth and grain yields around the world, particularly in key soybean-producing countries such as China and United States [8]. For example, in the Mississippi Delta region, flooding stress reduced overall soybean yield by up to 25% [9]. Previous studies also focused on the understanding of flooding influences on different soybean growth stages and yield cutbacks. It is reported that flooding reduced soybean yields by 17-43% at the vegetative growth stage and 50-56% at the reproductive stage [8,10]. Similarly, Sullivan et al. (2001) exhibited a 20% yield loss when soybean plots were flooded for three days at V2 and V3 growth stages [11].
Soybean flooding tolerance is considered to be a complex quantitative trait. Hence, QTL mapping is an effective way to unravel the underlying genetic mechanism of the tolerance. 27 QTLs on Soybase (http://www.soybase.org) (accessed on 5 February 2023) have been reported to be linked to flooding tolerance at seedling stage in soybeans by linkage mapping approaches [12,13] For example, Zhang et al. [14] detected nine significant QTLs for flooding tolerance on chromosomes 1, 4, 5, 16, and 18 for both flooding tolerance scores (FTSs) and survival rates (SRs) using a 'Benning × PI 416937' recombinant inbred line population. Over 20 QTLs were also found in another RIL population 'Danbaekkong × NTS1116' [15,16]. These QTLs are distributed across 20 chromosomes in the soybean genome, indicating the complex QTL architecture for the soybean tolerance. However, some major loci such as qWT_Gm03 on chromosome 3 for waterlogging tolerance and Qrld-12 for root length development (RLD) or Qrsad-12 for root surface area development (RSAD) on chromosome 12 have been confirmed to promote the stress tolerance [17,18].
In contrast to waterlogging tolerance research of soybean plants with a focus on plant injury and yield loss, the effect of seed-flooding tolerance at germination stage has not been well determined. Due to the excessive rainfall at sowing time in some areas, soybean cultivars with seed-flooding tolerance at germination stage are desirable for breeding. A previous study found no significant differences in the evaluation of seed-flooding tolerance between laboratory and soil experiment [19]. It is noteworthy that four integrated QTLs named Sft1, Sft2, Sft3, and Sft4 were identified by evaluating germination rate (GR) and normal seedling rate (NS) at germination stage. Sft2 on Chromosome 08 exhibited large effects on seed-flooding tolerance. Interestingly, this QTL was near I genetic locus which controls the seed coat pigmentation [20]. A total of 33 QTLs including 26 main-effect ones for the seed-flooding tolerance parameter of relative seedling length were identified in two related RIL populations with a common tolerance parent M8206 by using a restricted two-stage multi-locus multi-allele genome-wide association study [21]. By using a genomewide association analysis (GWAS) method, 14 SNPs were found to be associated with the tolerance to waterlogging [22]. A total of 25 QTNs associated with GR, NS, and electric conductivity (EC) were detected in a germplasm population [23]. As one of the important indicators, the value of EC marked the intracellular substance leakage of soybean seeds under stress conditions, including flooding stress [24]. EC was also found to be closely related to the seed vigor, which reflects the seed germination and unhealthy seedling developments [25], and the germination test can be replaced by EC measurements carried out under soaking treatment [26,27]. Moreover, a total of 40 and 20 SNPs associated with GR and EC were found on a panel of 243 Plant Introduction from the USDA collection [28]. However, these QTLs are detected from a limited cultivated soybean background. It is essential to mine and detect stable quantitative trait loci (QTLs) and related candidate genes in more genetic backgrounds.
Compared with cultivated soybean (Glycine max (L.) Merr.), the annual wild soybean (Glycine soja Sieb. and Zucc.) displayed high genetic diversity and is acknowledged as the progenitor of cultivated soybean [29,30]. Previous studies demonstrated that the wild soybean not only has abundant oil [31] and protein content [32] but also exhibits strong adaptability and resistance/tolerance to various biotic and abiotic stresses [33][34][35][36], indicating that wild soybean can be used as the important source of genetic variability. Thus, we can improve the seed-flooding tolerance of cultivated soybean by introducing favorable alleles from wild soybean into elite cultivars. Wild soybean accessions were shown to be more waterlog-tolerant than cultivated soybean in a recent study [37]. Furthermore, the wild soybean accession PI342618B has demonstrated high seed-flooding tolerance [38], with the majority of seedlings surviving after a 140 h flooding treatment, and five candidate genes for the tolerance were screened out from differentially expressed genes of root tissue transcriptomic profile between PI342618B and a sensitive line [38]. By keeping the above in view, the objective of the present study was to identify QTLs for seed-flooding tolerance using two RIL populations derived by crossing common wild soybean accession PI342618B with two sensitive cultivated soybeans, NN86-4 and NN493-1, and then to mine candidate genes and elite alleles from Glycine soja in developing high seed-flooding tolerant soybean varieties.

Plant Materials
Two inter-specific RIL populations, NJIRNP and NJIR4P, consisting of 284 and 161 lines, respectively, were used for elucidating the genetic basis of seed-flooding tolerance. The NJIRNP and NJIR4P populations were derived by crossing the common seed-flooding tolerant wild soybean genotype PI342618B (G. soja) with two sensitive cultivated soybean varieties, NN86-4 (G. max) and NN493-1 (G. max). Parental accessions and two RIL populations were planted at the Jiangpu Experiment Station of Nanjing Agricultural University (32.12 • N, 118.37 • E), Jiangsu Province, China, in two consecutive growing seasons in 2013 and 2014, designated as JP13 and JP14. Both RIL populations were grown in a completely randomized block design with three replications.

Phenotypic Evaluation of Seed-Flooding Tolerance
For determining the optimum duration of seed-flooding treatment, 50 seeds were dipped into distilled water for different time points of 0, 2, 3, 4, 5, 6, and 7 days. Seeds of good quality were sterilized with 70% ethanol for 10 s to remove the contaminants. Then, seeds were rinsed with distilled water three times. After the water had evaporated naturally and the seeds became dry, flooding treatment was applied using 350 mL plastic cups with 100 mL distilled water at 25 • C and sterilized. Petri dishes were placed on top to cover the cups to prevent the water evaporation. The experiment was conducted in a completely randomized design with three replications. Based on the previous studies related to seedflooding tolerance, both RIL populations and parental accessions were phenotypically evaluated for germination rate (GR) and normal seedling rate (NSR) [24]. Moreover, the electrical conductivity (EC) represented the content of seed substance leakage [25], and it was closely related to seed germination and survival rate [26,27]. Immediately after the treatment, a conductivity meter (model: DDS-307A) was utilized to record the EC value of steep-water plastic cups. A germination experiment was carried out using paper roll method [39]. Then, all the seeds were grown for 5 days in normal conditions, and the number of germinated seeds and normal seedlings were recorded. Seeds with radicle length more than 1 cm were regarded as germinated. Seedlings with normal cotyledon, radicle, and root were regarded as the normal seedlings. For control, seeds without flooding treatment were grown in the same conditions. The GR and NSR were estimated using the following equation: GR = (germination rate under flooding treatment)/(germination rate in control) × 100% NSR = (normal seedling rate under flooding treatment)/(normal seedling rate in control) × 100%

Analysis of Phenotypic Data
The analysis of variance (ANOVA) and descriptive statistics, including mean, range, standard deviation (SD), skewness, kurtosis, and correlations among three traits, were calculated using the SAS PROC generalized linear model (GLM), PROC UNIVARIATE, and PROC CORR programs, respectively [40]. The broad-sense heritability (h 2 ) of RIL populations was estimated using the following equation: where σ 2 G is the genotypic variance, σ 2 GE is the variance of the genotype-by-environment interaction, σ 2 e is the error variance, n is the number of environments, and r is the number of replications within an environment [41].

QTL Mapping Analysis
Two high-density genetic linkage maps of NJRINP and NJRI4P populations contain 5728 and 4354 bin markers, respectively, constructed based on 89,680 and 80,995 single nucleotide polymorphisms (SNPs), and these SNPs were previously reported by Wang et al., 2016. Moreover, the total genetic distance covered was 2204.6 and 2136.7 cM for NJRINP and NJRI4P populations, with an average distance of 0.4 and 0.5 cM between neighboring bins, respectively [42]. The information of genetic and physical positions of all bin markers was presented (Supplementary Table S1).
The QTL analysis was performed via WinQTLCart 2.5 software [43] and QTLNetwork 2.2 [44]. For the WinQTLCart 2.5 software, the model of composite interval mapping (CIM) was used with a 10 cM window at a walking speed of 1 cM. The LOD threshold was calculated using 1000 permutations for an experimental-wise error rate of p = 0.05 to determine whether the QTL was significantly associated with traits [45]. For QTLNetwork 2.2, the model of mixed linear composite interval mapping (MCIM) was utilized to identify major additive effects QTLs, epistatic QTLs (AA), and genotype-by environment interaction effects (additive by environment (AE) and AA by environment (AAE)) [46].

qRT-PCR and Sequence Analysis
Soybean genomic data(Williams82 version) was downloaded from the Phytozome (http://phytozome.jgi.doe.gov) (accessed on 12 April 2014), SoyBase (http://www.soybase. org) (accessed on 13 April 2014), and SoyKB (http://soykb.org/) (accessed on 13 April 2014) websites. According to the physical regions related to seed-flooding, candidate genes were extracted from the predicted gene list based on the gene annotations (http: //www.soybase.org) (accessed on 7 May 2014). The CDS sequencing was performed to verify the nucleotide mutation of candidate genes in three parental accessions, PI342618B, NN86-4, and NN493-1. The root parts of seedlings were collected for RNA extraction and qRT-PCR analysis. Total RNA was extracted by using the RNA Simple Total RNA kit (TIANGEN, Beijing, China) by following the manufacturer's protocol. The genes were amplified by using Phanta ® Max Super Fidelity DNA Polymerase from Vazyme and sent to Generalbiol for sequencing. Amino acid homology alignment was analyzed using DNAMAN and BioXM 2.6. cDNA was synthesized using the Prime ScriptTM RT Reagent Kit (TaKaRa, Shiga, Japan) according to the manufacturer's instructions. qRT-PCR(SYBR Green, FP207) was conducted to validate the differential expression levels of the candidate genes among three parental accessions. The experiment was run with three biological replicates, and an actin3-expressing gene was used as the internal control. All the primers were designed by Vector NTI 11.5 (Supplementary Table S2).

Vector Construction and Subcellular Localization
The method of homologous recombination was used for the vector construction in this study. The primers were designed based on the nucleotide sequence of GmDREB2 (Glyma.08g137600) and the XhoI restriction site in pJRH0641-GFP (Supplementary Table S2). The full-length coding region of GmDREB2 was introduced to pJRH0641-GFP for overexpression of GmDREB2 driven by the CaMV 35S promoter (35S::GmDREB2-GFP), and the empty vector containing 35S:GFP was used as a control. The construction of the recombinant plasmid was conducted using In-Fusion ® HD Cloning Plus (TaKaRa, Japan) by following the manufacturer's protocol. Then, the recombinant plasmid was transformed into E.coli DH5α and finally transformed into the Agrobacterium tumefaciens strain EHA105 by the electroporation method. The transient expression was carried out via the infection of Agrobacterium liquids containing the recombinant plasmid of pJRH0641-GmDREB2::GFP in young N. benthamiana leaves (5 weeks-old). After 36 h of the infection, the fluorescence in N. benthamiana leaves was detected using a confocal laser scanning microscope (Zeiss LSM780, Oberkochen, Germany).

Transformation of Soybean Hairy Roots
The 35S::GmDREB2-GFP overexpression vector was transformed into the Agrobacterium rhizogenes strain K599 for hairy root transformation by the electroporation method [42]. The cotyledonary node of young seedlings (one-week-old) with unfolded cotyledons were infected with Agrobacterium rhizogenes carrying the 35S::GmDREB2-GFP. The seedlings were transferred to the germination chamber with 16 h light/8 h dark cycle, at 25 • C, and with high humidity (>90%). When the emerged hairy roots grew up to 5-10 cm after approximately 3 weeks, the primary roots were removed. The transgenic positive hairy roots were screened by detecting the GFP signals using the fluorescence microscope. Then, the transgenic positive hairy roots were grown under hydroponics cultivation using 1/2 Hoagland nutrient solution for five days and finally treated with flooding stress for 4d. The fresh weight of hairy roots was tested by the analytical balance.

Determination of Optimum Seed-Flooding Treatment Time Point
Optimum seed-flooding treatment was determined by the evaluation of germination rate (GR), normal seedling rate (NSR), and electrical conductivity (EC) for three parents, viz., NN86-4, NN493-1, and PI342618B at different treatment time points (0, 2, 3, 4, 5, 6, and 7 days). Both GR and NSR of NN86-4 and NN493-1 were markedly decreased after 3 days of treatment, while most seeds of PI342618B germinated very well ( Figure 1A,B), The GR and NSR of NN86-4 and NN493-1 were continuously reduced with the increasing treatment time after 4 days of treatment, and they were both maintained at a low level (<15%) after 5 days of treatment. In contrast, the EC of NN86-4 and NN493-1 rapidly increased along with the treatment time, suggesting more leakage of electrolytes from the seeds, while the EC of PI342618B was constantly maintained at a lower level ( Figure 1C). Furthermore, the phenotype and germination performance of three parents are shown in Figure 2. The seedling growth of PI342618B was considerably better than that of NN86-4 and NN493-1 under 4 d flooding treatment ( Figure 2C), indicating that it was advantageous to differentiate the performance of these three parents under 4d treatment. In conclusion, the above three germination-related parameters revealed a significant difference between seed-flooding tolerant (PI342618B) and sensitive (NN86-4 and NN493-1) soybean parents under 4 d flooding treatment (Figures 1 and 2). Hence, we selected 4 days as the optimum flooding treatment time for the evaluation of seed-flooding tolerance in the present study.
In conclusion, the above three germination-related parameters revealed a significant difference between seed-flooding tolerant (PI342618B) and sensitive (NN86-4 and NN493-1) soybean parents under 4 d flooding treatment (Figures 1 and 2). Hence, we selected 4 days as the optimum flooding treatment time for the evaluation of seed-flooding tolerance in the present study.  In conclusion, the above three germination-related parameters revealed a significant difference between seed-flooding tolerant (PI342618B) and sensitive (NN86-4 and NN493-1) soybean parents under 4 d flooding treatment (Figures 1 and 2). Hence, we selected 4 days as the optimum flooding treatment time for the evaluation of seed-flooding tolerance in the present study.

Evaluation of Phenotypic Variation
The seed size of NN86-4 and NN493-1 was much bigger than that of PI342618B. The external appearance of parental seeds after 4 d of flooding treatment in the plastic cups is presented in Figure 2B. The phenotypic data analysis of three parents is shown in Table 1. The GR and NSR of PI342618B were significantly higher relative to those of NN86-4 and NN493-1 in both JP13 and JP14 environments, whereas the EC of PI342618B was considerably lower. These results indicated that PI342618B was more tolerant to seed-flooding stress compared to cultivated parents NN86-4 and NN493-1, and NN86-4 was relatively more tolerant than NN493-1. Additionally, the phenotypic performance of NJRINP and NJRI4P populations was observed ( Table 2). Descriptive statistics, ANOVA (F-value), and estimates of heritability were performed for all three traits related to seedflooding tolerance in both populations under JP13 and JP14 environments ( Table 2). The heritability estimate (h 2 ) of GR, NSR, and EC across both years ranged from 0.67 to 0.89. High positive significant correlations (r = 0.89 to 0.96, p < 0.001) were observed between GR and NSR across environments in both populations, while both GR and NSR were negatively correlated with EC (r = −0.31 to −0.81, p < 0.001) ( Table 3). The ANOVA results displayed that the difference between RILs for both populations was highly significant for all three traits (p < 0.01) (Supplementary Table S3). The environmental differences and genotype-by-environment (G × E) interaction effects were also significant for these three traits (p < 0.01).   The datapoints under and above the diagonal represent r values of Jiangpu 2013 on the left and Jiangpu 2014 on the right, respectively. *** indicates significant level at p < 0.001.

QTL Mapping of Seed-Flooding Tolerance by CIM
The QTL identification of GR, NSR, and EC was performed using high-density genetic maps of NJIRNP and NJIR4P populations. As shown in Table 4, 25 QTLs explaining 3.53-30.58% of the phenotypic variation (R 2 ) associated with seed-flooding tolerance were detected via CIM in different populations and environments. Eight QTLs associated with GR were identified on five chromosomes. Among these QTLs, two were located on Chromosome 6; three were located on Chromosome 8, and the remaining were located on Chromosome 1, Chromosome 17, and Chromosome 20, respectively (Table 4). These eight QTLs explained 3.71 to 30.58% of phenotypic variance (PV). Notably, the most prominent QTL with the highest LOD score (25.14) was identified on Chromosome 8, which is contained in qGR-8-2, explaining 30.58% of PV. Additionally, qGR-8-2 was further detected in both populations and environments. For NSR, seven QTLs were identified on three chromosomes. Out of these QTLs, two were located on Chromosome 6, three were located on Chromosome 8, and the remaining two were located on Chromosome 15. The QTL with the highest LOD score (23.56) was identified in the qNSR-8-2 region, explaining 28.93% of PV. Moreover, ten QTLs for EC were identified on seven chromosomes. Out of these QTLs, qEC-8-1, qEC-8-2, and qEC-8-3 detected on Chromosome 8 were located in the same mapping region as detected for GR and NSR. In addition, qEC-8-2 was identified in both populations as well as in different environments, explaining 6.63-21.12% of PV, and one QTL in the qEC-8-2 region displayed the largest effect with R 2 = 21.12%. The remaining seven QTLs were only detected in a single population or environment (Table 4).
All the QTLs identified for GR and NSR in both populations displayed negative effects, and all of the positive alleles were from the tolerant parent PI342618B. Similarly, all the QTLs for EC revealed positive effects, suggesting the positive alleles that reduced the electrolyte leakage from seeds came from tolerant wild parent. By comparing the physical regions of QTLs for three traits detected on Chromosome 8, we found that all these QTLs were extremely close and located in three genomic regions, indicating that there were QTL clusters/hotspots related to seed-flooding tolerance on Chromosome 8. Moreover, most of these QTLs within these clusters displayed large effects with R 2 > 10%, suggesting that Chromosome 8 was most likely rich in genes involved in seed-flooding tolerance in soybean.

QTL Mapping by MCIM and Comparative Analysis of CIM and MCIM Methods
To further validate the QTLs detected by CIM, we performed MCIM to dissect the additive QTLs. A total of 18 major additive QTLs for GR, NSR, and EC were identified in both populations, and these QTLs were distributed on seven chromosomes (Table 5). These additive QTLs explained 2.0-30.15% of the PV. For GR, five QTLs were identified, including qGR-8-1 and qGR-8-3 detected in same genomic regions on Chromosome 8 as reported through CIM. The qGR-8-1 was identified in both populations, accounting for 20.42% and 20.60% of the PV, respectively. For NSR, four additive QTLs were detected and one of these QTLs; qNSR-8-1 exhibited large additive effects and explained 30.15% and 24.00% of the PV in NJIRNP and NJIR4P, respectively. For EC, nine QTLs were identified, and all of these QTLs were only detected in a single population either NJIRNP or NJIR4P. Among them, qGR-8-1 and qGR-8-2 were located in the same genomic regions on Chromosome 8 as reported through CIM. Interestingly, we observed that stable QTLs (qGR-6-1, qGR-6-2, qNSR-6-1, qNSR-6-2, qEC-6-1 and qEC-6-2) were mapped in the same regions on Chromosome 6 only in the NJIR4P population, and most of these QTLs can be verified through CIM. Additionally, all the positive alleles of QTLs for GR and NSR were derived from wild parent PI342616B, while the additive effects of QTLs for EC were negative, indicating that the positive alleles of QTLs for EC were from cultivated parents NN86-4 and NN493-1.  Furthermore, the epistasis and QTL-by-environment (Q×E) interactions were performed in this study. A total of four pairs of digenic epistatic QTLs were identified for NSR and EC (Table 6). In the NJIRNP population, only one pair of epistatic QTLs for EC was identified, and it consisted of two non-additive effect QTLs (qEC-9-1 and qEC-20-1), explaining 3.42% of the PV. Moreover, in the NJIR4P population, three pairs of epistatic QTLs were detected. Out of these, one pair was identified for NSR which consisted of two non-additive QTLs (qNSR-3-1 and qNSR-9-1), accounting for 5.03% of the PV. The remaining two pairs were identified for EC, and one pair consisted of two additive QTLs, viz., qEC-4-1 and qEC-8-1, explaining 3.20% of the PV. Another pair was composed of two additive qEC-3-1 and qEC-15-1 and explained 2.54% of the PV. In conclusion, we performed a comparative analysis of major QTLs associated with seed-flooding tolerance through CIM and MCIM. A total of 25 and 18 QTLs were identified by these two methods. Among these QTLs, 12 QTLs were commonly detected by both methods, indicating that these QTLs were stable and reliable. It is noteworthy that an important genomic region for seed-flooding tolerance was identified on Chromosome 8, which contains major QTLs for all three traits GR, NSR, and EC detected in both populations and environments by two methods. Furthermore, we also found stable genomic regions harboring QTLs for all three of the above traits on Chromosome 6, and these QTLs could be validated by both CIM and MCIM. However, these QTLs on Chromosome 6 were only identified in NJIR4P population, and most of these QTLs exhibited minor effects (R 2 < 10%). In summary, we obtained a stable major genomic region associated with all three traits (GR, NSR, and EC) on Chromosome 8, and this region was verified in various populations and environments, as well as different mapping methods. Hence, this critical genomic region on Chromosome 8 can be considered as a potential candidate genomic region for breeding high seed-flooding tolerant soybean.

QTL Hotspots for Seed-Flooding Tolerance
QTL clusters/hotspots are defined as densely populated QTL regions on the chromosome that contain multiple QTLs associated with various traits. Based on the genetic position of major QTLs of GR, NSR, and EC, we observed that QTLs related to these traits were consistently located in the same or similar genomic regions. By comparing the mapping results using CIM, QTL hotspots on Chromosome 6 and Chromosome 8 were identified (Table 7). We found three QTL "hotspots 8-1, 8-2 and 8-3" on Chromosome 8, and these regions were adjacent to each other but non-overlapping ( Table 7). The QTL "hotspot 8-1" harbors three QTLs related to all three traits and covers the physical length of 1.13 Mb. Likewise, QTL "hotspot 8-2" contains three QTLs within the physical interval of 1.35 Mb on Chromosome 8. Moreover, QTL "hotspot 8-3" possessed two QTLs within a physical genomic interval of 0.75 Mb. Among these three QTL hotspots on Chromosome 8, QTL "hotspot 8-2" containing multiple QTLs associated with all three traits has been detected in all studied environments and populations as well as mapping methods (Tables 4 and 5), while "hotspots 8-1 and 8-3" were only identified in a single population. Furthermore, all the QTLs underlying QTL "hotspot 8-2" were major QTLs with R 2 > 10% (Table 7). In addition, two QTL "hotspots 6-1 and 6-2" on Chromosome 6 containing two/three QTLs related to two/three studied traits were only detected in the NJIR4P population under a single environment, and all the QTLs underlying these two QTL hotspots displayed minor effects (R 2 < 10%) ( Table 7). In conclusion, the major QTL "hotspot 8-2" located on Chromosome 8 was considered as the major genomic region governing the inheritance of seed-flooding tolerance in soybean and can be utilized as the major targets for fine mapping, candidate gene identification, and marker-assisted breeding.

Genetic Association between Seed-Flooding Tolerance and Seedcoat Pigmentation
An earlier study reported that seed-coat color was closely linked with seed-flooding tolerance, and varieties with a pigmented seed-coat displayed higher tolerance than those with yellow seed-coat [19]. In this study, we observed that a large proportion of NJIRNP and NJIR4P populations showed various pigmented seed-coat colors, including black, brown, green, and other mixed colors. To identify the potential association between seed-flooding tolerance and seed-coat pigmentation, the effects of seed-coat color on seedflooding tolerance were investigated. The results showed that the GR and NSR of the genotypes with pigmented seed-coat in both NJIRNP and NJIR4P were significantly higher than those of genotypes with yellow seed-coat (Table 8). Additionally, the GR and NSR of the genotypes with black color were considerably higher compared to the genotypes with other pigmented colors. Moreover, the EC of pigmented varieties was remarkably lower than that of yellow varieties, suggesting that less seed electrolyte leakage occurs in pigmented soybean genotypes under flooding treatment. In conclusion, all of this evidence supports the view that varieties with pigmented seed-coat tended to be more seed-flooding tolerant and genotypes with black seed-coat color displayed the largest tolerance. In soybean, the I locus plays a critical role in the inhibition of seed-coat pigmentation, and it was identified as a genomic region of unusual cluster arrangement of three chalcone synthase genes, viz., CHS1, CHS3, and CHS4 on Chromosome 8 [47].
Interestingly, the QTL hotspots on Chromosome 8 identified in the present study associated with seed-flooding tolerance were also detected near the I locus. Based on the close association between seed-flooding tolerance and seed-coat pigmentation described above, we hypothesized that these three CHS genes might be the candidate genes for seedflooding tolerance. To further clarify the sequence differences of CHS1, CHS3, and CHS4, we sequenced these genes in PI342616B, NN86-4, and NN493-1. Subsequently, two, six, and one nucleotide variations were identified in CHS1, CHS3, and CHS4, respectively (Supplementary Table S4). However, all these nucleotide variations were synonymous mutations and did not result in any changes to the amino acid sequence of the corresponding proteins (Supplementary Figure S1). By combining the above results, it was revealed that the QTL hotspots and I locus were located in the same genomic region on Chromosome 8, but CHS1, CHS3, and CHS4 were not the candidate genes involved in seed-flooding tolerance.

qRT-PCR and Sequence Analysis of Candidate Genes
As the major QTL "hotspot 8-2" associated with all traits was consistently identified by both methods as well as in multiple environments and populations, the candidate gene prediction analysis was further performed in this region. All the model genes and their annotations were downloaded from Phytozome (https://phytozome.jgi.doe.gov) (accessed on 17 May 2014), Soybase (http://www.soybase.org) (accessed on 17 May 2014), and SoyKB (http://soykb.org/) (accessed on 18 May 2014), databases, and a total of 164 genes were identified within QTL "hotspot 8-2". In order to identify key candidate genes controlling seed-flooding tolerance, the expression levels of these 164 genes in different soybean tissues were examined using the RNA-seq data from Soybase (https://www.soybase.org/soyseq/) (accessed on 24 May 2014), ( Figure S2). As the root was an important plant organ in response to flooding stress [48], candidate genes with high expression in roots were selected. Additionally, previous study revealed that flooding tolerance was potentially associated with carbohydrate catabolism and cell wall loosening [49]. Moreover, genes encoding glycosyl transferase proteins, cytochrome P450, and ERF/AP2 transcription factors were reported to be involved in flooding stress during germination [50]. By combining the analysis of gene expression, functional annotation, and the published literature, a total of ten candidate genes were screened (Supplementary Table S5). To confirm the expression of these genes under flooding treatment, we performed qRT-PCR analysis. The results revealed that five of these predicted candidate genes, Glyma.08G122400, Glyma.08g125100, Glyma.08G137600, Glyma.08g145300, and Glyma.08G149200 were significantly induced under flooding treatment compared to the control in all three parental accessions (Figure 3), indicating that these genes may play key roles in response to flooding stress. Moreover, all these three up-regulated genes displayed relatively high expressions in seed-flooding tolerant parent PI342618B compared to sensitive parents (NN86-4 and NN493-1). Based on these findings, these three differentially expressed genes were initially considered as the possible candidate genes for seed-flooding tolerance.  To further verify the above hypothesis, we performed a sequence analysis of the fulllength coding region of these three candidate genes in three parents, viz., PI342616B, NN86-4, and NN493-1. By comparing the sequence differences, a single base mutation (G/A) in the CDS region of Glyma.08G122400 in wild parent PI342616B was observed (Supplementary Figure S3A), and this synonymous mutation did not result in any changes in amino acid (Supplementary Figure S3B). Additionally, Glyma.08G137600 exhibited three To further verify the above hypothesis, we performed a sequence analysis of the full-length coding region of these three candidate genes in three parents, viz., PI342616B, NN86-4, and NN493-1. By comparing the sequence differences, a single base mutation (G/A) in the CDS region of Glyma.08G122400 in wild parent PI342616B was observed (Supplementary Figure S3A), and this synonymous mutation did not result in any changes in amino acid (Supplementary Figure S3B). Additionally, Glyma.08G137600 exhibited three base insertions (-/TTC) in PI342616B ( Figure 4A). This mutation caused the insertion of serine in the low complexity region of protein ( Figure 4B,C). However, no nucleotide changes were detected in Glyma.08g125100, Glyma.08g145300, and Glyma.08G149200. Based on the above results of qRT-PCR and sequence analysis, Glyma.08G137600 was considered as the most possible candidate gene for seed-flooding tolerance, and this gene was designated as GmDREB2 for further study.

Subcellular Localization and Transgenic Hairy Roots Verification of GmDREB2
To reveal the subcellular localization of the GmDREB2 protein, we performed th transient expression of 35S::GmDREB2-GFP in fusion with GFP in N.benthamiana leaves Based on the prediction of subcellular localization on SoftBerry (http://linux1.soft berry.com/), GmDREB2 was localized in the nucleus and plasma membrane. As shown in Figure 5A, the green fluorescence of GmDREB2-GFP was mainly distributed in the nucleu

Subcellular Localization and Transgenic Hairy Roots Verification of GmDREB2
To reveal the subcellular localization of the GmDREB2 protein, we performed the transient expression of 35S::GmDREB2-GFP in fusion with GFP in N.benthamiana leaves. Based on the prediction of subcellular localization on SoftBerry (http://linux1.softberry. com/), GmDREB2 was localized in the nucleus and plasma membrane. As shown in Figure 5A, the green fluorescence of GmDREB2-GFP was mainly distributed in the nucleus and plasma membrane, which confirmed that GmDREB2 protein was localized to the nucleus and plasma membrane. stress. Moreover, the fresh weight of the GmDREB2-OE hairy roots was significantly heavier than that of the control ( Figure 5C). These preliminary results reveal that the overexpression of GmDREB2 can promote the growth of soybean hairy roots under flooding stress. However, further validation is still required to verify the molecular function of GmDREB2 in seed-flooding tolerance in soybeans. Additionally, to investigate the role of GmDREB2 under flooding stress, 35S::GmDREB2-GFP was generated for overexpression (GmDREB2-OE) analyses in the hairy roots. The transgenic soybean hairy roots were generated via the Agrobacterium rhizogenes-mediated hairy root transformation system [51], and the susceptible cultivated parent NN493-1 was used as the transgenic receptor material. In order to avoid the wilting of hairy roots, all the transgenic hairy roots were immersed in the liquid, and no difference in the hairy roots was observed between GmDREB2-OE and the control before treatment. As shown in Figure 5B, a significant difference was found, and the hairy roots of GmDREB2-OE displayed relatively better growth than the control after 4 d of flooding stress. Moreover, the fresh weight of the GmDREB2-OE hairy roots was significantly heavier than that of the control ( Figure 5C). These preliminary results reveal that the overexpression of GmDREB2 can promote the growth of soybean hairy roots under flooding stress. However, further validation is still required to verify the molecular function of GmDREB2 in seed-flooding tolerance in soybeans.

Discussion
Soybean was found to be relatively sensitive to flooding stress during germination, vegetative, and reproductive stages in a previous study [52]. It is an effective way to reduce the impact of flooding stress on soybean production by screening elite flooding tolerant germplasm and breeding high tolerant varieties [53]. However, the flooding tolerance of soybeans is a complex quantitative trait governed by multiple loci/genes with high environmental influence [14,20]. Over the past decades, 27 QTLs have been detected and reported on SoyBase using bi-parental mapping populations, which are distributed across the 20 chromosomes in the soybean genome, with several more recently detected [14,16].  [23], respectively. The two separate models for germination rate (GR), electrical conductivity (EC), and normal seedling rate (NSR) detected QTN13 on chromosome 13. Glyma.13g248000 (GmSFT), a QTN13-predicted candidate gene, was discovered to have a nonsynonymous mutation in seed-flooding tolerant genotypes, leading to an amino acid substitution in the protein. However, some major QTLs could be effectively utilized in breeding improved flooding-tolerant varieties or identifying candidate genes using marker-assisted selection (MAS). In this context, to better dissect the genetic basis of seedflooding tolerance in soybean, two inter-specific high-density linkage maps of NJRINP and NJRI4P populations were utilized to identify major and stable QTLs related to seedflooding tolerance using three parameters (GR, NSR, and EC) at the germination stage. Furthermore, two different methods, CIM and MCIM, were performed to improve the accuracy of mapping results. A total of 25 and 18 QTLs were detected by CIM and MCIM, respectively. Among these QTLs, 12 common QTLs were verified through both CIM and MCIM, indicating that these QTLs were stable and could be effectively utilized as potential candidate regions for enhancing seed-flooding tolerance. Interestingly, the mapping results displayed that QTLs related to various traits were distributed as clusters, and QTL hotspots of seed-flooding tolerance were identified on Chromosome 6 and Chromosome 8. We detected two QTL hotspots "6-1 and 6-2" on Chromosome 6. All the QTLs contained in hotspot "6-1"and hotspot "6-2" were only identified in a single population (NJIR4P) and explained small phenotypic variation with R 2 < 10.0%, indicating that these loci were minor QTLs related to seed-flooding tolerance.
Among the identified QTL hotspots, QTL hotspot "8-2" possessing QTLs associated with all three traits was detected in both populations and environments, while QTLs within hotspot "8-1"and hotspot "8-3" were only detectable in single populations (NJIR4P or NJIRNP). Furthermore, QTL hotspot "8-2" was validated by both CIM and MCIM, and all the QTLs contained in "hotspot 8-2" displayed large effects on seed-flooding tolerance (R 2 > 10.0%). Some previous studies also displayed major QTLs related to seed-flooding tolerance in this map region [20,28]. For example, Sayama et al. (2009) identified sft2 near the I locus on chromosome 8 (LG_A2) [20], which is involved in seed coat pigmentation. Therefore, it was concluded that QTL hotspot "8-2" was the most likely candidate region for seed-flooding tolerance. Besides the QTL hotspots detected on Chromosome 6 and Chromosome 8, we also identified some QTLs on other chromosomes, including Chromosome 1, Chromosome 3, Chromosome 15, and so on. By comparing the genomic position of these QTLs with the previously identified loci, qGR-1-1 and qEC-1-1 are overlapped with the Flood tolerance 6-1 locus coded on the SoyBase website detected by Rizal and Karki [54], and the region of qEC-17-1 on Chromosome 17 also includes QTL SFT_17-52 and RSL-a-17-1a found by Dhungana et al. [16] and Ali et al. [21].
The genetic makeup of complex traits is not only regulated by main-effect QTLs/genes but also by inter-locus interactions as well as their interactions with the environment [55]. Understanding the epistasis and QTL × environment (QE) interaction effects are essential for revealing the crucial genetic basis for determining quantitative traits and improving QTL detection and selection accuracy. Hence, ignoring inter-genic interaction will result in an overestimation of individual QTL effects and an underestimation of genetic variance [56]. In this study, four pairs of digenic epistatic QTLs were detected, three for EC and one for NSR. Out of these, only one epistatic QTL pair (qEC-4-1 and qEC-8-1) displayed individual additive effects, while the remaining three epistatic QTL pairs showed non-additive effects. These three epistatic QTL pairs might serve as modifying genes that themselves have no significant effects but regulate the expression of flooding-tolerance related genes through epistatic interactions. All four pairs have significant AA effects. However, the total PVE explained by four epistatic pairs related to seed-flooding tolerance was about 14.19%. Similar results for the epistatic interaction of QTLs for different traits in soybean have been reported in early studies [57,58]. The presence of epistatic interactions for a given trait makes selection more difficult. Interestingly, all main-effect QTLs identified in the present study, except for qEC-4-1 and qEC-8-1, had no epistatic interactions, which increased the heritability of the trait and made selection easier.
A previous study found a strong correlation between seed-flooding tolerance and seed-coat color in soybean [19]. Moreover, early study also demonstrated that the seedflooding of pigmented genotypes was significantly more tolerant compared to genotypes with yellow seed-coat [20]. Therefore, these results potentially indicated that Sft2 and I were the same locus or that they possessed strong linkage with each other [20]. Similar results were observed in the present study that genotypes with pigmented seed-coat (black, brown, green, and other pigmented colors) exhibited more seed-flooding tolerance than those with yellow seed-coat in NJIRNP and NJIR4P populations. These findings provide further evidence that the seed-coat pigmentation is strongly associated with seed-flooding tolerance. Interestingly, three identified QTL "hotspots 8-1, 8-2 and 8-3" for seed-flooding tolerance in our study were located near the I locus on Chromosome 8. It is known that the I locus controlling seed-coat pigmentation encompasses a region of three duplicated chalcone synthase genes (CHS1, CHS3, and CHS4) in soybean [59]. Thus, to exclude the possibility that these three CHS genes were the candidate genes involved in seed-flooding tolerance, we performed the sequence analysis of CHS genes between PI342616B and NN86-4, NN 493-1. The results revealed no changes were observed in the amino acid sequence of CHS proteins among parents, indicating that the CHS pathway may be independent of seed-flooding tolerance. Furthermore, our additional evidence supported this viewpoint because we observed many accessions with the yellow seed-coat exhibiting high GR and NSR in both populations in the present study.
So far, limited studies have been carried out to identify candidate genes for seedflooding tolerance in soybean. In this regard, mining of the candidate genes for seedflooding tolerance in soybean revealed 164 model genes within the physical intervals of QTL "hotspot 8-2" on Chromosome 8 in the present study. Based on the expression levels of these genes in different soybean tissues from Soybase (https://www.soybase.org/soyseq/) and the functional annotations related to seed-flooding tolerance, ten candidate genes were screened. The qRT-PCR and sequence analysis were further performed, and the results revealed that Glyma.08G137600 was significantly induced under 4 d of flooding stress in all three parents (PI342618B, NN86-4, and NN493-1) compared with the control. Moreover, three consecutive base insertions (-/TTC) in the coding region of PI342616B were observed, and this mutation resulted in the insertion of serine in the low complexity region of protein.
These results indicated that Glyma.08G137600 was the most likely possible candidate gene for seed-flooding tolerance in soybean, and this gene was designated as GmDREB2. The functional annotation of GmDREB2 exhibited that it encodes the ERF transcription factor, which contains the AP2 domain. In rice, an ethylene response factor (ERF) gene named Sub1A was characterized and involved in submergence tolerance [60]. Moreover, two ethylene response factors SNORKEL1 and SNORKEL2, which play vital roles in response to deepwater stress by ethylene signaling, were identified [61]. In Arabidopsis, it was demonstrated that three ERF-VIIII transcription factors (RAP2.2, RAP2.3, and RAP2.12) were involved in regulating and activating the core anaerobic response [62]. All the above functional studies illustrate the important role of the ERF transcription factor in flooding stress. Thus, GmDREB2 was considered the most likely candidate gene for seedflooding tolerance in soybean in the present study. The results demonstrated that the overexpression of GmDREB2 relatively promoted the growth of soybean hairy roots under flooding stress, indicating that GmDREB2 was positively involved in the regulation of seed-flooding tolerance in soybean. However, it requires more verification experiments to validate the molecular function of GmDREB2 for its specific roles in seed-flooding tolerance.
In the present study, we reported that the wild parent PI342616B exhibited strong seed-flooding tolerance compared to the sensitive cultivar parents NN86-4 and NN493-1 under flooding stress. Moreover, all the favorable positive alleles of identified QTLs were derived from PI342616B, which supported the view that wild soybean gene pool is an abundant source of elite stress-tolerant genes. Wild soybean, as the progenitor of cultivated soybean, has been reported to display high contents of oil and protein, rich in genetic diversity, strong adaptability, and tolerance to various stresses [21][22][23]. Due to the favorable tolerance alleles contributed by wild soybean, it has become an efficient way to improve the seed-flooding tolerance of elite soybean varieties by introducing the seed-flooding tolerant QTLs/genes from wild soybean into elite cultivars.

Conclusions
The present study was a detailed investigation into uncovering the genetic basis of seed-flooding tolerance in soybean, in which we used high-density inter-specific genetic maps of two RIL populations (NJIRNP and NJIR4P) for identifying QTLs associated with seed-flooding tolerance. In total, we identified 25 and 18 QTLs by CIM and MCIM, respectively. Most of the identified QTLs were novel loci, and all the positive alleles were derived from wild parent PI342618B. Interestingly, one major QTL "hotspot 8-2" containing many QTLs related to all three traits was identified on Chromosome 8. Based on the gene expression and function annotation analysis of model genes within the QTL "hotspot 8-2", ten candidate genes were selected. The results of qRT-PCR and sequence analysis demonstrated that GmDREB2(Glyma.08G137600) was the most possible candidate gene responsible for seed-flooding tolerance in soybean. GmDREB2 encodes an ERF transcription factor and is localized in the nucleus and plasma membrane. Moreover, the overexpression of GmDREB2 significantly promoted the growth of soybean hairy roots under flooding stress, indicating that GmDREB2 plays an important role in the regulation of seed-flooding tolerance in soybean. However, it needs further validation to elucidate the functional roles of GmDREB2 in seed-flooding tolerance. Finally, our findings will be useful for marker-assisted breeding of elite seed-flooding tolerant soybean varieties, as well as increasing our understanding of the genetic control of seed-flooding tolerance in soybean.